Learning Domain-Specific Discourse Rules for Information Extraction
نویسندگان
چکیده
This paper describes a system that learns discourse rules for domaln-speclfic analysis of unrestricted text. The goal of discourse analysis in this context is to transform locally identified references to relevant information in the text into a coherent representation of the entire text. This involves a complex series of decidons about merging coreferential objects, filtering out irrelevant information, inferring missing information, and identifying logical relations between domain objects. The Wrap-Up discourse analyzer induces a set of classifiers from a tra]n|ng corpus to handle these discourse decisions. Wrap-Up is fully tr~nable, and not only determ|nes what classifiers are needed based on domain output specifications, but automatically selects the features needed by each classifier. Wrap-Up’s classifiers blend linguistic knowledge with real world domain knowledge.
منابع مشابه
AAAI 1995 Spring Symposium on Empirical Methods in Discourse Interpretation and Generation Learning Domain-Speci c Discourse Rules for Information Extraction
This paper describes a system that learns discourse rules for domain-speci c analysis of unrestricted text. The goal of discourse analysis in this context is to transform locally identi ed references to relevant information in the text into a coherent representation of the entire text. This involves a complex series of decisions about merging coreferential objects, ltering out irrelevant inform...
متن کاملRecognising Discourse Causality Triggers in the Biomedical Domain
Current domain-specific information extraction systems represent an important resource for biomedical researchers, who need to process vast amounts of knowledge in a short time. Automatic discourse causality recognition can further reduce their workload by suggesting possible causal connections and aiding in the curation of pathway models. We describe here an approach to the automatic identific...
متن کاملRelational Learning of Pattern-Match Rules for Information Extraction
Information extraction is a form of shallow text processing which locates a specified set of relevant items in natural language documents. Such systems can be useful, but require domain-specific knowledge and rules, and are time-consuming and difficult to build by hand, making infomation extraction a good testbed for the application of machine learning techniques to natural language processing....
متن کاملCorpus-Driven Knowledge Acquisition for Discourse Analysis
The availability of large on-line text corpora provides a natural and promising bridge between the worlds of natural language processing (NLP) and machine learning (ML). In recent years, the NLP community has been aggressively investigating statistical techniques to drive part-of-speech taggers, but application-specific text corpora can be used to drive knowledge acquisition at much higher leve...
متن کاملOntology-driven discourse analysis for information extraction
This paper presents a novel approach to discourse analysis within information extraction systems. It makes use of DRT as formal representation of the linguistic context as well as of a domain-specific ontology as a basis to compute conceptual relations between extracted events thus establishing discourse coherence. The approach has been implemented within GenIE, an information extraction system...
متن کامل